feat(skills): support loading skills from GitHub/Git URLs#2091
feat(skills): support loading skills from GitHub/Git URLs#2091dgallitelli wants to merge 6 commits intostrands-agents:mainfrom
Conversation
Add support for remote Git repository URLs as skill sources in AgentSkills, enabling teams to share and reference skills directly from GitHub without manual cloning. Closes strands-agents#2090 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse GitHub web URLs like /tree/<ref>/path to extract the clone URL, branch, and subdirectory path. This enables loading skills from subdirectories within mono-repos. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
01a4fc3 to
83966f6
Compare
|
/strands review |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" |
There was a problem hiding this comment.
Issue: _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" is evaluated at module import time. In containerized environments (Lambda, Docker) where HOME may not be set, Path.home() raises RuntimeError.
Suggestion: Evaluate this lazily, e.g., inside clone_skill_repo when cache_dir is None. You could use a sentinel or simply inline Path.home() / ".cache" / "strands" / "skills" inside the function:
cache_dir = cache_dir or Path.home() / ".cache" / "strands" / "skills"This also respects XDG_CACHE_HOME if you want to be a good citizen on Linux:
import os
cache_dir = cache_dir or Path(os.environ.get("XDG_CACHE_HOME", Path.home() / ".cache")) / "strands" / "skills"| _DEFAULT_CACHE_DIR = Path.home() / ".cache" / "strands" / "skills" | ||
|
|
||
| # Patterns that indicate a string is a URL rather than a local path | ||
| _URL_PREFIXES = ("https://", "http://", "git@", "ssh://") |
There was a problem hiding this comment.
Issue: http:// is included in URL prefixes, allowing skill repos to be cloned over plaintext HTTP. This exposes users to man-in-the-middle attacks where a malicious actor could inject arbitrary code into the cloned skill.
Suggestion: Remove "http://" from _URL_PREFIXES, or at minimum log a warning when an http:// URL is detected in clone_skill_repo. Git itself warns about this, but the SDK should guide users toward secure defaults (tenet: "the obvious path is the happy path").
| key = cache_key(url, ref) | ||
| target = cache_dir / key | ||
|
|
||
| if not target.exists(): |
There was a problem hiding this comment.
Issue: Race condition — if two threads/processes call clone_skill_repo concurrently with the same URL, both pass the not target.exists() check and attempt to clone into the same directory. One will fail or produce a corrupt state.
Suggestion: Use an atomic pattern: clone to a temporary directory first, then os.rename() (atomic on the same filesystem) to the target path. If the target already exists by the time rename happens, just delete the temp clone. Something like:
import tempfile
with tempfile.TemporaryDirectory(dir=cache_dir) as tmp:
tmp_target = Path(tmp) / "repo"
# ... clone into tmp_target ...
try:
tmp_target.rename(target)
except OSError:
# Another process won the race — that's fine, use their clone
pass| return hashlib.sha256(key_input.encode()).hexdigest()[:16] | ||
|
|
||
|
|
||
| def clone_skill_repo( |
There was a problem hiding this comment.
Issue: There is no mechanism to refresh a cached clone. When no ref is specified, the default branch is cloned. If the remote repo is updated, the user gets stale content indefinitely with no obvious way to refresh (other than manually deleting ~/.cache/strands/skills/).
Suggestion: Consider adding a force_refresh: bool = False parameter (or documenting the cache invalidation story clearly). At minimum, document in the docstring that users should delete the cache directory or pin a ref for reproducibility. This is especially important since the cache key is a hash — users can't easily find and delete the right directory.
| List of Skill instances loaded from the repository. | ||
|
|
||
| Raises: | ||
| RuntimeError: If the repository cannot be cloned or ``git`` is not |
There was a problem hiding this comment.
Issue: The Raises section documents RuntimeError but the method also raises ValueError (via is_url check on line 377). Additionally, calling Skill.from_url() with a non-URL raises ValueError, but using the same string through AgentSkills(skills=["./path"]) treats it as a local path. This inconsistency could confuse users.
Suggestion: Add ValueError to the Raises section in the docstring.
|
Issue: This PR introduces new public API surface ( Key questions an API reviewer should evaluate:
|
|
Issue: The Suggestion: Add |
| from ._url_loader import clone_skill_repo, is_url, parse_url_ref | ||
|
|
||
| if not is_url(url): | ||
| raise ValueError(f"url=<{url}> | not a valid remote URL") |
There was a problem hiding this comment.
Issue: The from_url ValueError uses an f-string in the exception message: f"url=<{url}> | not a valid remote URL". This is consistent with the structured logging style used elsewhere, but the AGENTS.md specifically says "Don't use f-strings in logging calls." Since this is an exception message (not a logging call), this is acceptable — but the logging calls throughout the module correctly use %s format, which is good.
However, the RuntimeError raised in clone_skill_repo at line 189 also uses an f-string: f"url=<{url}>, ref=<{ref}> | failed to clone...". This is fine since it's an exception message, not a log statement.
Review SummaryAssessment: Request Changes This is a well-structured feature that fills a real user need. The design decisions (subprocess git, shallow clone, hash-based caching) are sound and well-documented in the PR description. The test coverage is thorough with 32 new tests. Review Categories
The core implementation is solid — addressing the security and reliability items above would make this ready to merge. |
- Security: remove http:// support (MITM risk), only allow https/ssh/git@ - Reliability: atomic clone via tempdir + rename to prevent race conditions - Reliability: evaluate cache dir lazily (not at import time) for containers - Usability: respect XDG_CACHE_HOME for cache directory - Usability: add force_refresh parameter to re-clone stale unpinned refs - Docs: add ValueError to Skill.from_url() Raises section - Docs: add _url_loader.py to AGENTS.md directory tree - Tests: add coverage for XDG_CACHE_HOME, force_refresh, race condition, http:// rejection (184 tests total) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Per maintainer feedback, remove all git/SSH/cache logic and replace with simple HTTPS fetch of SKILL.md content via urllib (stdlib). - Drop git clone, subprocess, caching, tempdir, race condition handling - Only support https:// URLs (no git@, ssh://, http://) - Auto-resolve GitHub web URLs to raw.githubusercontent.com - Skill.from_url() now returns a single Skill (not list) - Remove cache_dir, force_refresh params from public API - Net -532 lines, zero new dependencies Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@mkmeral Great call — I've completely reworked this to drop all git/SSH/cache logic. The implementation is now HTTPS-only with zero new dependencies. What changedRemoved:
New approach:
Net result: -532 lines, zero dependencies, sandbox-friendly. Supported URL patterns# Direct raw URL
Skill.from_url("https://raw.githubusercontent.com/org/repo/main/SKILL.md")
# GitHub repo root (resolves to raw SKILL.md at root)
Skill.from_url("https://github.com/org/skill-repo")
# GitHub repo with @ref
Skill.from_url("https://github.com/org/skill-repo@v1.0.0")
# GitHub /tree/ URL (resolves to raw SKILL.md in that directory)
Skill.from_url("https://github.com/org/repo/tree/main/skills/my-skill")
# Any HTTPS URL hosting a SKILL.md
Skill.from_url("https://example.com/path/to/SKILL.md")All tested manually with real URLs (task-reviewer from your repo, brainstorming from obra/superpowers, data-agent from dgallitelli/aws-data-agent-skill) and via a live Bedrock agent. |
| Each source can be a Skill instance, a path to a skill directory, | ||
| or a path to a parent directory containing multiple skills. | ||
| a path to a parent directory containing multiple skills, or a remote | ||
| Git URL. |
There was a problem hiding this comment.
Issue: This docstring says "remote Git URL" but the implementation no longer uses git — it only supports https:// URLs fetched via urllib. This is stale from the v1 implementation.
Suggestion: Update to match the current implementation:
a path to a parent directory containing multiple skills, or an
HTTPS URL pointing to a SKILL.md file.
AGENTS.md
Outdated
| │ │ │ ├── core/ # Base classes, actions, context | ||
| │ │ │ └── handlers/ # Handler implementations (e.g., LLM) | ||
| │ │ └── skills/ # AgentSkills.io integration (Skill, AgentSkills) | ||
| │ │ └── _url_loader.py # Remote URL loading, git clone, caching |
There was a problem hiding this comment.
Issue: The AGENTS.md entry for _url_loader.py says "Remote URL loading, git clone, caching" which describes the v1 implementation. The module now does HTTPS fetching with no git or caching.
Suggestion: Update to:
│ │ └── _url_loader.py # HTTPS skill fetching, GitHub URL resolution
| return url | ||
|
|
||
| # Not a github.com URL — return as-is (user provides a direct link) | ||
| if not url.startswith("https://github.com"): |
There was a problem hiding this comment.
Issue: resolve_to_raw_url only handles GitHub-specific URL patterns (github.com → raw.githubusercontent.com). For non-GitHub URLs that aren't direct links to a SKILL.md file, the URL is returned as-is. If a user passes https://gitlab.com/org/repo, the function will return it unchanged and fetch_skill_content will try to fetch an HTML page (GitLab repo page), not the raw SKILL.md content.
This isn't necessarily a bug — the function documents that non-GitHub URLs pass through — but the error experience could be confusing: the user gets a RuntimeError from a failed parse of HTML content, not a clear "unsupported host" message.
Suggestion: Consider logging a debug message for non-GitHub URLs to make it clear the SDK expects a direct link to SKILL.md content, or add a brief note in the Skill.from_url docstring that only GitHub URLs get automatic resolution and other hosts require a direct link to the raw SKILL.md file.
| logger.info("url=<%s> | fetching skill content from %s", url, resolved) | ||
|
|
||
| try: | ||
| req = urllib.request.Request(resolved, headers={"User-Agent": "strands-agents-sdk"}) # noqa: S310 |
There was a problem hiding this comment.
Issue: The S310 noqa suppression is appropriate here since the URL is validated to be https:// by the caller (Skill.from_url calls is_url first). However, fetch_skill_content itself doesn't validate that url starts with https:// — it could be called directly with any URL. Since this is a private module, the risk is low, but a defensive check would make the function safer independently.
Suggestion: Consider adding a quick guard:
if not url.startswith("https://"):
raise ValueError(f"url=<{url}> | only https:// URLs are supported")This would make the S310 suppression more clearly justified.
| if owner_repo.count("/") == 1: | ||
| return f"https://raw.githubusercontent.com/{owner_repo}/{ref}/SKILL.md" | ||
|
|
||
| # Unrecognised GitHub URL pattern — return as-is |
There was a problem hiding this comment.
Issue: There's no test for the fallback case on line 113 where a GitHub URL has more than 2 path segments (e.g., https://github.com/org/repo/something-unexpected) and falls through to return the URL as-is. This is the "Unrecognised GitHub URL pattern" branch. Codecov also reports some missing coverage in this file.
Suggestion: Add a test case like:
def test_unrecognized_github_path(self):
url = "https://github.com/org/repo/wiki/Some-Page"
assert resolve_to_raw_url(url) == url|
Issue: This PR adds new public API surface ( Per the API bar raising guidelines, this is a moderate change (adding a new classmethod to an existing class that customers will use directly). The PR description is well-prepared with use cases and example code — this is great. Consider adding the A documentation PR should be opened to document |
Review SummaryAssessment: Request Changes The v2 rewrite is a big improvement — dropping git/SSH/cache in favor of HTTPS-only fetch with Review Categories
Great work on the simplification — this is close to ready. |
- Fix mypy error: annotate response.read().decode() return type - Add defensive https:// check in fetch_skill_content() - Fix stale "Git URL" docstring in _resolve_skills() - Fix stale "git clone, caching" description in AGENTS.md - Add note that non-GitHub URLs must point directly to SKILL.md - Add tests for unrecognized GitHub URL fallthrough and http:// rejection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| return source.startswith("https://") | ||
|
|
||
|
|
||
| def resolve_to_raw_url(url: str) -> str: |
There was a problem hiding this comment.
I'm don't think we should be doing the raw url translation. I'd say this is an application concern, that they should pass correct URLs to the Skill.from_url
This PR should be targeted at loading skills from URL, should not be opinionated on what that url is. At the end of the day, it's up to the devs to pass in the correct url
Skill.from_url should essentially be: get url as input, make GET request to get content from the url, pass it down to Skill.from_content. We should not have any complex resolution
There was a problem hiding this comment.
Done — removed all the GitHub URL resolution logic in the latest push. Skill.from_url() is now just: validate https://, GET the URL, pass to from_content(). ~65 lines total in _url_loader.py, no opinions on URL format.
Document Skill.from_url() and HTTPS URL support in AgentSkills: - New "Loading skills from URLs" section with examples - Add from_url() to programmatic creation examples - Update configuration table to mention URL sources - Note about resource directory limitations for URL-loaded skills Companion to strands-agents/sdk-python#2091 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Documentation PR: strands-agents/docs#756 |
Per review comment, Skill.from_url() should not be opinionated about URL format. Remove all resolve_to_raw_url logic — the method now simply GETs the provided URL and passes content to from_content(). Users are responsible for providing a direct link to raw SKILL.md content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| Returns: | ||
| Dict mapping skill names to Skill instances. | ||
| """ | ||
| from ._url_loader import is_url |
There was a problem hiding this comment.
do we explicitly need url loader import in method? if not, can we move to top?
| ValueError: If ``url`` is not an ``https://`` URL. | ||
| RuntimeError: If the SKILL.md content cannot be fetched. | ||
| """ | ||
| from ._url_loader import fetch_skill_content, is_url |
There was a problem hiding this comment.
we should not have imports in methods, unless necessary. let's move this to the top
| return source.startswith("https://") | ||
|
|
||
|
|
||
| def fetch_skill_content(url: str) -> str: |
There was a problem hiding this comment.
nit: do we need another file? can we just move this under Skill.from_url (or like as helper methods there)?
Summary
Closes #2090
AgentSkillsandSkill.from_url()https://,git@, andssh://URLs with optional@refsuffix for branch/tag pinning (e.g.,https://github.com/org/skill@v1.0.0)--depth 1) and cached locally at~/.cache/strands/skills/(configurable viacache_dirparameter)Usage
Changes
_url_loader.py(new)skill.pySkill.from_url()classmethodagent_skills.py_resolve_skills()to detect URL strings; addedcache_dirparametertest_url_loader.py(new)test_skill.pySkill.from_url()test_agent_skills.pyAgentSkillsDesign decisions
gitwhich is universally available. Handles auth naturally (SSH keys, credential helpers, tokens).--depth 1minimizes bandwidth and disk for skill repos.sha256(url + ref)[:16]as cache directory name. Repeated loads are instant.subprocess,hashlib,shutil).Test plan
@hookdecorator typing)aws-data-agent-skill)obra/superpowers/brainstorming)